Skip to content

Conversation

markurtz
Copy link
Collaborator

Summary

Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. The mock server enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments, supporting both streaming and non-streaming endpoints with proper token counting, latency simulation (TTFT/ITL), and error handling.

Details

  • Added mock_server package with modular architecture including configuration, handlers, models, server, and utilities
  • Implemented MockServerConfig with Pydantic settings for centralized configuration management supporting environment variables
  • Created HTTP request handlers for OpenAI-compatible endpoints:
    • ChatCompletionsHandler for /v1/chat/completions with streaming support
    • CompletionsHandler for /v1/completions legacy endpoint
    • TokenizerHandler for vLLM-compatible /tokenize and /detokenize endpoints
  • Added comprehensive Pydantic models for request/response validation compatible with both OpenAI and vLLM API specifications
  • Implemented high-performance Sanic-based server with CORS support, middleware, and proper error handling
  • Created mock tokenizer and text generation utilities with deterministic token generation for reproducible testing
  • Added timing generators for realistic latency simulation including TTFT (Time To First Token) and ITL (Inter-Token Latency)
  • Included comprehensive test suite with integration tests using real HTTP server instances

Test Plan

  • Unit/integration style tests added to automation

Related Issues

  • Part of the larger scheduler refactor initiative

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces a comprehensive mock server implementation that simulates OpenAI and vLLM APIs with configurable timing characteristics and response patterns. This enables realistic performance testing and validation of GuideLLM benchmarking workflows without requiring actual model deployments.

  • Modular architecture with configuration, handlers, models, server, and utilities components
  • HTTP request handlers for OpenAI-compatible endpoints with streaming and non-streaming support
  • High-performance Sanic-based server with CORS support and proper error handling

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/guidellm/mock_server/init.py Package initialization exposing main MockServer and MockServerConfig classes
src/guidellm/mock_server/config.py Pydantic-based configuration management with environment variable support
src/guidellm/mock_server/handlers/init.py Handler module initialization exposing request handlers
src/guidellm/mock_server/handlers/chat_completions.py OpenAI chat completions endpoint implementation with streaming support
src/guidellm/mock_server/handlers/completions.py Legacy OpenAI completions endpoint with timing simulation
src/guidellm/mock_server/handlers/tokenizer.py vLLM-compatible tokenization and detokenization endpoints
src/guidellm/mock_server/models.py Pydantic models for request/response validation and API compatibility
src/guidellm/mock_server/server.py Sanic-based HTTP server with middleware, routes, and error handling
src/guidellm/mock_server/utils.py Mock tokenizer and text generation utilities for testing
tests/unit/mock_server/init.py Test package initialization
tests/unit/mock_server/test_server.py Comprehensive integration tests using real HTTP server instances

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@markurtz markurtz force-pushed the features/refactor/benchmarker branch from 2515465 to 4834767 Compare September 19, 2025 12:20
@markurtz markurtz force-pushed the features/refactor/mock-server branch from 841e82c to b1cce19 Compare September 19, 2025 12:22
@markurtz markurtz force-pushed the features/refactor/mock-server branch from ca2be85 to bb98193 Compare September 19, 2025 12:31
sjmonson added a commit that referenced this pull request Sep 23, 2025
…nto features/refactor/base-draft

[GuideLLM Refactor] mock server package creation #357
sjmonson added a commit that referenced this pull request Sep 25, 2025
Base automatically changed from features/refactor/benchmarker to features/refactor/base September 29, 2025 14:19
@markurtz markurtz merged commit da02ee8 into features/refactor/base Sep 29, 2025
11 of 17 checks passed
@markurtz markurtz deleted the features/refactor/mock-server branch September 29, 2025 14:19
@markurtz markurtz added this to the v0.4.0 milestone Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants